A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

نویسندگان

  • Faming Liang
  • Jinsu Kim
  • Qifan Song
چکیده

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

Embarrassingly parallel sequential Markov-chain Monte Carlo for large sets of time series

Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an al...

متن کامل

Bayesian Estimation of Parameters in the Exponentiated Gumbel Distribution

Abstract: The Exponentiated Gumbel (EG) distribution has been proposed to capture some aspects of the data that the Gumbel distribution fails to specify. In this paper, we estimate the EG's parameters in the Bayesian framework. We consider a 2-level hierarchical structure for prior distribution. As the posterior distributions do not admit a closed form, we do an approximated inference by using ...

متن کامل

Bayesian and Classical Estimation of Stress-Strength Reliability for Inverse Weibull Lifetime Models

In this paper, we consider the problem of estimating stress-strength reliability for inverse Weibull lifetime models having the same shape parameters but different scale parameters. We obtain the maximum likelihood estimator and its asymptotic distribution. Since the classical estimator doesn’t hold explicit forms, we propose an approximate maximum likelihood estimator. The asymptotic confidenc...

متن کامل

Approximating Bayes Estimates by Means of the Tierney Kadane, Importance Sampling and Metropolis-Hastings within Gibbs Methods in the Poisson-Exponential Distribution: A Comparative Study

Here, we work on the problem of point estimation of the parameters of the Poisson-exponential distribution through the Bayesian and maximum likelihood methods based on complete samples. The point Bayes estimates under the symmetric squared error loss (SEL) function are approximated using three methods, namely the Tierney Kadane approximation method, the importance sampling method and the Metrop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Technometrics : a journal of statistics for the physical, chemical, and engineering sciences

دوره 58 3  شماره 

صفحات  -

تاریخ انتشار 2016